11 research outputs found

    Mage - Reactive articulatory feature control of HMM-based parametric speech synthesis

    Get PDF
    In this paper, we present the integration of articulatory control into MAGE, a framework for realtime and interactive (reactive) parametric speech synthesis using hidden Markov models (HMMs). MAGE is based on the speech synthesis engine from HTS and uses acoustic features (spectrum and f0) to model and synthesize speech. In this work, we replace the standard acoustic models with models combining acoustic and articulatory features, such as tongue, lips and jaw positions. We then use feature-space-switched articulatory-to-acoustic regression matrices to enable us to control the spectral acoustic features by manipulating the articulatory features. Combining this synthesis model with MAGE allows us to interactively and intuitively modify phones synthesized in real time, for example transforming one phone into another, by controlling the configuration of the articulators in a visual display. Index Terms: speech synthesis, reactive, articulators 1

    Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

    Get PDF
    Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool

    Ανίχνευση παθολογίας φωνής σε πραγματικό χρόνο με χρήση αυτοσυσχέτισης και σύντομες εκτιμήσεις του Jitter

    No full text
    Η φωνή είναι το αποτέλεσμα του συντονισμού όλου του πνευομονοαναπνευστικού μηχανισμού. Σήμερα, οι παθολογίες φωνής απασχολούν όλο και περισσότερο την κοινωνία, καθώς η φωνή και η ομιλία παίζουν σημαντικό ρόλο σε ορισμένα επαγγέλματα, καθώς επίσης και στη γενική ποιότητα της ζωής του πληθυσμού. Η ανάλυση της φωνής επιτρέπει την ανίχνευση και ταυτοποίηση των ασθενειών του φωνητικού μηχανισμού. Σήμερα, η ταυτοποίηση αυτή πραγματοποιείται από έναν γιατρό εμπειρογνώμονα μέσω κλασσικής ιατρικής (ΩΡΛ) εξέτασης αλλά και με τη χρήση επεμβατικών απεικονιστικών μεθόδων, καθώς και με τη χρήση μη επεμβατικών μεθόδων με βάση την ακουστική ανάλυση του παραγόμενου από τον ασθενή, λόγο. Τα τελευταία χρόνια έχει δοθεί έμφαση στα πρωταρχικά στάδια ανίχνευσης παθολογίας στη φωνή, όπου χρησιμοποιούνται κλασικές μετρήσεις διαταραχής (jitter, shimmer, HNR, κλπ). Πηγαίνοντας ένα βήμα παραπέρα το παρόν έργο έχει ως στόχο να υλοποιήσει και να εφαρμόσει ένα σύστημα ανίχνευσης παθολογίας φωνής σε πραγματικό χρόνο σε συνδυασμό με μια διεπαφή Java.Voice is the result of the coordination of the whole pneumophonoarticulatory apparatus. Voice pathologies have become a social concern, as voice and speech play an important role in certain professions, and in the general population quality of life. The analysis of the voice allows the identification of the diseases of the vocal apparatus and currently is carried out from an expert doctor through methods based on the auditory analysis. In these last years emphasis has been placed in early pathology detection, for which classical perturbation measurements (jitter, shimmer, HNR, etc.) have been used. Going one step ahead the present work is aimed to implement a real time voice pathology detection system, combined with a Java interface


    No full text
    In this paper, we present a modified version of HTS, called performative HTS or pHTS. The objective of pHTS is to enhance the control ability and reactivity of HTS. pHTS reduces the phonetic context used for training the models and generates the speech parameters within a 2-label window. Speech waveforms are generated on-the-fly and the models can be reactively modified, impacting the synthesized speech with a delay of only one phoneme. It is shown that HTS and pHTS have comparable output quality. We use this new system to achieve reactive model interpolation and conduct a new test where articulation degree is modified within the sentence. Index Terms — speech synthesis, HTS, reactive control 1